please help
GrammaMT: Improving Machine Translation with Grammar-Informed In-Context Learning
Ramos, Rita, Chimoto, Everlyn Asiko, ter Hoeve, Maartje, Schluter, Natalie
We introduce GrammaMT, a grammatically-aware prompting approach for machine translation that uses Interlinear Glossed Text (IGT), a common form of linguistic description providing morphological and lexical annotations for source sentences. GrammaMT proposes three prompting strategies: gloss-shot, chain-gloss and model-gloss. All are training-free, requiring only a few examples that involve minimal effort to collect, and making them well-suited for low-resource setups. Experiments show that GrammaMT enhances translation performance on open-source instruction-tuned LLMs for various low- to high-resource languages across three benchmarks: (1) the largest IGT corpus, (2) the challenging 2023 SIGMORPHON Shared Task data over endangered languages, and (3) even in an out-of-domain setting with FLORES. Moreover, ablation studies reveal that leveraging gloss resources could substantially boost MT performance (by over 17 BLEU points) if LLMs accurately generate or access input sentence glosses.
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search
Li, Chenglin, Chen, Qianglong, Li, Zhi, Tao, Feng, Li, Yicheng, Chen, Hao, Yu, Fei, Zhang, Yin
Instruction tuning is a crucial technique for aligning language models with humans' actual goals in the real world. Extensive research has highlighted the quality of instruction data is essential for the success of this alignment. However, creating high-quality data manually is labor-intensive and time-consuming, which leads researchers to explore using LLMs to synthesize data. Recent studies have focused on using a stronger LLM to iteratively enhance existing instruction data, showing promising results. Nevertheless, previous work often lacks control over the evolution direction, resulting in high uncertainty in the data synthesis process and low-quality instructions. In this paper, we introduce a general and scalable framework, IDEA-MCTS (Instruction Data Enhancement using Monte Carlo Tree Search), a scalable framework for efficiently synthesizing instructions. With tree search and evaluation models, it can efficiently guide each instruction to evolve into a high-quality form, aiding in instruction fine-tuning. Experimental results show that IDEA-MCTS significantly enhances the seed instruction data, raising the average evaluation scores of quality, diversity, and complexity from 2.19 to 3.81. Furthermore, in open-domain benchmarks, experimental results show that IDEA-MCTS improves the accuracy of real-world instruction-following skills in LLMs by an average of 5\% in low-resource settings.
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Bonatti, Rogerio, Zhao, Dan, Bonacci, Francesco, Dupont, Dillon, Abdali, Sara, Li, Yinheng, Lu, Yadong, Wagle, Justin, Koishida, Kazuhito, Bucker, Arthur, Jang, Lawrence, Hui, Zack
Large language models (LLMs) show remarkable potential to act as computer agents, enhancing human productivity and software accessibility in multi-modal tasks that require planning and reasoning. However, measuring agent performance in realistic environments remains a challenge since: (i) most benchmarks are limited to specific modalities or domains (e.g. text-only, web navigation, Q&A, coding) and (ii) full benchmark evaluations are slow (on order of magnitude of days) given the multi-step sequential nature of tasks. To address these challenges, we introduce the Windows Agent Arena: a reproducible, general environment focusing exclusively on the Windows operating system (OS) where agents can operate freely within a real Windows OS and use the same wide range of applications, tools, and web browsers available to human users when solving tasks. We adapt the OSWorld framework (Xie et al., 2024) to create 150+ diverse Windows tasks across representative domains that require agent abilities in planning, screen understanding, and tool usage. Our benchmark is scalable and can be seamlessly parallelized in Azure for a full benchmark evaluation in as little as 20 minutes. To demonstrate Windows Agent Arena's capabilities, we also introduce a new multi-modal agent, Navi. Our agent achieves a success rate of 19.5% in the Windows domain, compared to 74.5% performance of an unassisted human. Navi also demonstrates strong performance on another popular web-based benchmark, Mind2Web. We offer extensive quantitative and qualitative analysis of Navi's performance, and provide insights into the opportunities for future research in agent development and data generation using Windows Agent Arena. Webpage: https://microsoft.github.io/WindowsAgentArena Code: https://github.com/microsoft/WindowsAgentArena
Evaluation is all you need. Prompting Generative Large Language Models for Annotation Tasks in the Social Sciences. A Primer using Open Models
Weber, Maximilian, Reichardt, Merle
The advancement of Large Language Models (LLMs) has opened up new avenues for tackling annotation tasks in the field of social sciences. These models, especially the newer iterations like Chat-GPT or GPT-4, are now being used to annotate textual data (Gilardi, Alizadeh, & Kubli, 2023; Heseltine & Hohenberg, 2023; Møller, Dalsgaard, Pera, & Aiello, 2023; Ziems et al., 2023), which can be helpful for analyzing various social and political phenomena (Törnberg, 2023; Ziems et al., 2023). However, a significant challenge arises when there is a necessity to share research data with proprietary and closed models that are provided by companies due to the utilization of APIs (Ollion, Shen, Macanovic, & Chatelain, 2023; Spirling, 2023). This is particularly concerning in scenarios where data sharing is not preferable due to data privacy. In light of this, open models which can be operated on independent devices like university servers, present a viable alternative (Alizadeh et al., 2023). They allow researchers to harness the potential of generative large language models without compromising data security. This paper endeavors to promote the adoption of open models by providing two examples and guidelines for leveraging them instead of proprietary models for annotation tasks within the social sciences.
Meet CIMON, the 1st Robot with Artificial Intelligence to Fly in Space
A beautiful space-exploration friendship between human and machine may have just begun. Early this morning (June 29), a small robot endowed with artificial intelligence (AI) launched on a two-day trip to the International Space Station aboard SpaceX's Dragon cargo capsule. No other AI-equipped machine has ever flown to space before, project team members said. The mission of the bantam astronaut assistant -- known as CIMON, short for "Crew Interactive Mobile Companion" -- is relatively short and modest. But its work off-Earth could help pave the way for some pretty big things, according to NASA officials.
[D] I'm new here, and I really don't know where to start. Please help. • r/MachineLearning
I'm new here and I would like to know where I can start to learn about machine learning and gradually move up the ladder to finally research on deep learning. My singular goal is to learn and contribute towards machine learning and even apply it to solve problems. I get really excited reading all the research posted in this subreddit but most of it just flies over my head.
Please help me find a good book on neural networks? • /r/MachineLearning
So i am trying to implement my first neural network. Went for Tanh implementation (so all nodes go from -1 to 1) because most documentation stated that it is superior to linear 0-1 or the same thing as tanh with 0-1 output. I am having a hard time grasping most documentation because it generates questions i can't resolve with the documentation, and all the big algebra stuff you usually find on wikipedia is voodoo to me most of the times. It explained backprop and error calculation well. However the math doesn't seem to add up on it.